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Abstract 


This paper describes the kernel of a tool system which permits external 
tools to interface and which can serve as an integrating component in an 
existing CAD tool environment. This architecture largely avoids the need 
for duplicating existing tools. The system kernel provides a set of func- 
tions which form the basis for the development of novel design capabilities 
which will utilize AI technology. The design system operates on a worksta- 
tion which can serve as a host for the Connection Machine System. The 
Connection Machine is a massively parallel computer which offers general 
purpose acceleration capabilities and which provides high interactivity for 
the workstation user. 
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Introduction 


Significant advances have been made in recent years in the area of elec- 
tronic CAD (computer-aided design) tools. A large number of worksta- 
tions, software products and hardware accelerators have changed the way 
digital systems are designed. Digital system design is not new, however. 
A variety of design methods and personal design styles of designers have 
evolved over many years. Many companies, particularly those in the elec- 
tronics industry, have built up significant design capabilities. While many 
CAD products in today’s market show impressive performance in certain 
design areas, they do not and probably cannot address all design problems. 
Hence, a typical design center must operate with a combination of exist- 
ing tools, tools which are developed in-house and tools purchased on the 
market. This situation calls for “global” integration capabilities of tools. 
Often much of the productivity gain that a certain tool can provide for a 
particular design task is lost because of poor integration with the remain- 
der of the tools used in a given design process. Although there are many 
instances where CAD products of different vendors can be combined, there 
is still a large amount of duplication of products. With better integration 
capabilities many of these efforts could be channeled into the development 
of innovative products. As is well known, there is much room for improve- 
ment of tools at the level of systems design, for building tools which provide 
more automation, and for exploring design aids which utilize a substantial 
amount of knowledge about circuit design. 

The increasing size of designs of digital systems, in particular VLSI (very 
large scale integrated) circuits, and the increase in sophistication of tools 
leads to increasing demand of computing power. A design center needs to 
be able to take advantage of new hardware products without having to move 
to an entirely new system of CAD tools. This brings up the requirement 
for portability of tools. Portability is, in this context, to be understood in 
a very general sense. In particular, graphical tools may have lots of code 
which is specific to a given hardware and which may have to be changed 
for a different hardware. However, the changes which are visible to the 
user should be kept minimal. A similar argument applies to the emerging 
generation of computers with new dimensions of parallelism. 

With regard to computing power, there is a special problem with work- 
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stations. For handling very large designs, memory size and computation 
speed quickly become a bottleneck. There is typically no easy way for a 
workstation user to obtain missing resources from some large mainframe or 
network server. Let us take a somewhat closer look at this problem. 

The combination of powerful processor chips and powerful raster graph- 
ics systems created an opportunity for new companies to develop single user 
workstations with much improved interactive capabilities compared to ter- 
minals on time sharing systems. Large mainframe manufacturers moved 
very slowly. Hence, the integration of these workstations with large main- 
frames was mostly poor. As a result, the access to existing tools which 
operate on other computers and the access to mainframe computing power 
was poor. Some of these problems will be eliminated with the advent of 
workstations provided by the same manufacturers as the mainframes and 
with the improvement of standard network interfaces. What will remain is 
a basic problem with distributed workstations, namely, lack of mainframe 
memory size and mainframe computing power. This causes problems with 
handling data bases of big designs and provides little interactivity for com- 
putation intensive subtasks which have to run on a different computer. 
There are several approaches to tackle this problem, such as the use of 
unused capacity on a network of workstations for big tasks. The most de- 
sirable solution, of course, may be to have a supercomputer workstation 
or a supercomputer which provides workstation capabilities if you prefer. 
Because of the need for integration of external tools, a crucial ingredient to 
any workstation is a solution to the networking problem. 

The remainder of this paper discusses a VLSI design system which is un- 
der development at Thinking Machines Corporation. This system provides 
an integrating component which permits the interface of external tools. The 
system also provides a basis for the development of tools which incorporate 
Al (artificial intelligence) technology. The workstation on which the tool 
system operates can also be part of the Connection Machine Computer|, a 
massively parallel computer developed at Thinking Machines Corporation. 
The tool system also operates on the Connection Machine System which 
provides supercomputer performance and high interactivity. 
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Overview of the VLSI Design System 


A system kernel has been developed which permits the interface of existing 
tools and serves as an integrating component for those tools. The kernel 
operates on a workstation. It permits the interface of tools even if they 
operate on different computers to which just a network connection exists. 
This system kernel consists essentially of a design data base with a number 
of interactive tools. 

In order to be able to focus on the development of tools which clearly 
add new capabilities to circuit design tools and which eventually utilize 
AI technology, Lisp was chosen as the programming environment. The 
choice of Lisp was also influenced by the desire to have a programming 
environment for rapid prototyping. We believe that the tool system can 
be moved into other language environments with reasonable effort, but this 
has not yet been determined in sufficient detail. 

The workstation on which the tools operate can be an integral part of 
the Connection Machine Computer. In this framework, the Connection 
Machine is effectively used as a general purpose accelerator. However, un- 
like special purpose accelerators, it speeds up numerous computations of 
a typical design process, ranging from graphics, data base, simulation, to 
layout applications. Unlike most other accelerators, the Connection Ma- 
chine offers high interaction capabilities and, if used with the data base of 
the kernel system, it avoids long compilation times for tasks to be acceler- 
ated. Tools of the system kernel can operate without a Connection Machine 
and the tools on the Connection Machine can be interfaced to other tool 
systems. However, a substantial synergism can be obtained by combining 
these tools. 


The Kernel of the Design System 
The Design Data Base 


The design data base is the central component of the tool system to which 
all other design tools are interfaced. It keeps the representation of a design 
and provides a number of access and checking functions to the design rep- 
resentation. This permits a design to remain consistent through all design 
phases. It significantly reduces errors and time for design iterations even 
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though external tools may be used which require extensive data conversions 
and which may run on another computer to which a poor communication 
link exists. 

The data base resides during a design session in the address space of 
the machine rather than in files on disks. The data base representation is 
accessed by programs like any other data object. This permits extremely 
fast interactions with the data base and makes possible many data base 
operations which would be prohibitive otherwise. Since extremely large 
amounts of space would be needed to keep all levels of hierarchy and all 
layers of design representations of a large design, it is possible to select parts 
of the data base. For instance, only a small number of design representation 
layers or levels of hierarchy need to be included for any given design task or 
set of design tasks. Also, it is possible to work just with a limited number 
of nestings in the design hierarchy. 


Design Representations Supported in the Design Data Base 


Design representations capture essentially the same design information as 
described, for example, in the EDIF document!”!. Special attention has 
been given to the problem of maintaining common data on different lay- 
ers of representation, e.g., names of nets which are used at the logic level 
are also used at various layout levels where appropriate. Support is pro- 
vided for using different hierarchical partitionings for the logic and layout 
representation. 

For each representation layer or view supported by the system, there 
exists a language which permits description of circuit representations in 
procedural form. Such a procedure describes how the circuit representation. 
is generated for a given set of parameters. This capability can be used, for 
instance, for describing circuits with varying numbers of inputs and outputs 
and different drive capabilities or it can be used to describe full macro 
compilers for ALUs. These concepts are a generalization and extension of 
concepts used in the DPL system!*!. The set of design languages in the tool 
system is extensible. 

Special emphasis has been given to features which permit modularized 
designs and circuit libraries in a way that a generic design can be readily 
implemented with. different circuit libraries. 
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“Local” Integration of Tools 


Local integration refers to tools which are a part of the tool system. Tools 
in the system can be roughly partitioned into analysis tools (e.g., simula- 
tors) which operate on a given level of representation and into translation 
tools which translate from one layer of representation into another (e.g., 
placement and routing). Depending on the design style (e.g., standard cell 
or gate-array), different tools can be used for a given design step. The 
tool system also supports given design step tools with varying degrees of 
automation; e.g., instead of an automatic placement program, the designer 
can use an interactive floorplanning and placement tool. It is even possible 
to combine these tools. For instance, a designer can use the floorplanning 
tool for providing placement information for the top levels of the hierarchy 
and can then let an automatic tool complete the detailed placement within 
the constraints provided by the floorplanning tool. 


Utilization of Features of the Lisp Environment for Circuit Design 


A number of features of the Lisp environment add substantially to the 
functionality of the design system. Among those features are incremental 
compilation, the possibility of having data objects that live longer then the 
run of a program or function, powerful debugging capabilities, and powerful 
text editors. 


A “Programming Approach” to Circuit Design 


The procedural design capability removes the distinction between circuit 
design and tool design. We saw the case of a module generator which can 
be viewed as a general circuit or a tool which generates the circuit. Circuit 
descriptions and tools are made of the same stuff, namely, Lisp programs. 
While procedural design capabilities are available in many design systems, 
in a Lisp environment all the good tools for program development can also 
be utilized for circuit development. In fact, with the highly interactive 
tools like simulators, circuit design is very similar to program development. 
Once a procedure which describes a circuit has been designed, the designer 
can exercise the circuit with the simulator and some test vectors in much 
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the same way a programmer executes his programs with some test data. 
This encourages designing and debugging circuits interactively. 

A consequence of this programming approach to circuit design is that 
lots of useful functions can be developed by a designer to tailor the tools 
to his particular design style or to a particular project. More importantly, 
these capabilities add to the adaptability of the tool system to technology 
variations. 

The methods used for integration of tools permit the efficient use of 
tools which provide a continuum from handcrafting capabilities to fully 
automatic design. We believe that this kind of integration which permits 
the use of different tools cooperatively rather than exclusively is an essential 
ingredient for incorporating more “smartness” into the tool system by using 
Al technology. The design system does not have very many “smart” tools 
yet. However, with the basis that has been provided, we can move rapidly 
to incorporate such tools without having to change the basis of the tool 
system. 


The Connection Machine as a General Purpose 
Accelerator 


The major contributions of workstation hardware to VLSI CAD systems 
are fast interactive capabilities and high quality graphics capabilities. As 
pointed out before, problems arise if design problems have to be solved 
which exceed the limits of resources of workstations. Special purpose hard- 
ware accelerators] can provide some relief, but only for very specific 
design tasks. Overall improvement of computing capacity usually requires 
access to some general purpose mainframe computer. If the mainframe is 
needed in order to run specific computation intensive tasks, then the in- 
teractivity is significantly reduced for those subtasks because interactions 
have to go through a network. E.g., interactions with a simulator running 
on a host via a workstation are not nearly as good as a simulator running 
on a workstation. If the mainframe is needed also because the workstation 
cannot store the design anymore, then the overall interactive speed of the 
workstation will be even more reduced. The effectiveness of the worksta- 
tion is then reduced to its graphical display capabilities. The Connection 
Machine provides a solution to this problem. A detailed discussion of the 
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Connection Machine can be found in!!J, We summarize the hardware and 
software concepts which are relevant for the purposes of this paper. No- 
tice that the following is a simplified and incomplete description of the 
Connection Machine. 


System Overview 


The Connection Machine can be viewed as a large collection of processing 
elements which are connected to a communication system. Each processing 
element consists of a processor and memory. The communication system 
permits each processor to access, in addition to its own memory, the mem- 
ory of any other processor in the entire machine. The Connection Machine 
is interfaced to the memory bus of a standard computer called the host. The 
host can access the entire memory of the Connection Machine in the same 
Way as it accesses its own memory. The host also provides the instructions 
which are to be executed by the processing elements and reads signals from 
the Connection Machine which controls the program flow. This interface 
permits the use of the entire infrastructure of the host computer system, for 
example, the operating system, the file system, networking facilities, etc. 
The Connection Machine has a very fast swapping disk. Notice that this 
provides a rather elegant solution to two problems with existing worksta- 
tions. It solves the interface problem at the hardware and software level. It 
also solves problems with computing resources for handling large designs. 
The Connection Machine has 64,000 one-bit processors. 


Programming Concepts 


The essential changes required to the software environment of the host 
in order to use the Connection Machine are the following changes to a 
programming language like C or Lisp. 

A type of records is introduced which has the same properties as before, 
but which will be located in one processing element when an instance is 
created. If required by a given problem, the number of processing elements 
can be increased beyond the number of physical processors in the machine 
by using virtual processors. 

Pointers are from a programming viewpoint as before. However, if point- 
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ers are used between records which reside in processing elements, then ref- 
erences along those pointers involve data movement through the routing 
system of the machine. A programmer can ignore this from a conceptual 
viewpoint; however, he may wish to distinguish those pointers from local 
memory pointers for program optimization. 

Operations on basic data objects like integers, reals, strings, etc. are 
extended to operations which can be performed in parallel on all objects or 
some subset of objects which reside in processing elements of the Connection 
Machine. 

A special construct is provided to select a subset of records which reside 
in processing elements such that parallel operations are only performed on 
selected objects. 

Notice that most of the essential parameters of the hardware can remain 
transparent to the programmer. Thus, these parameters can change in 
order to take advantage of technology advances or design improvements 
without affecting programs which are written for the Connection Machine. 


Overview of VLSI Design Tools on the Connection Machine 


We first list and briefly comment on tasks in a workstation which can 
greatly benefit from the computing power provided by the Connection Ma- 
chine. Then we discuss the parallel implementation of several algorithms 
in some detail. Many algorithms are still in an experimental stage and we 
hope to publish them in detail elsewhere. We will discuss simplified cases, 
however, in order to explain how these algorithms can be parallelized on 
the Connection Machine. It is essential to keep in mind that the Connec- 
tion Machine can be looked at as a general purpose accelerator, which does 
not just speed up a specific design task but a whole range of computation 
intensive tasks. Furthermore, all tools which run on the Connection Ma- 
chine are highly interactive and share data structures with the data base. 
Some of the components of a tool system which can utilize the Connection 
Machine are the design data base, the graphics display system, simulation 
at different levels, placement and routing, different types of layout pro- 
cessing such as design rule checking, automatic spacing, circuit extraction, 
and generation of mask data for E-beam machines. Other areas for which 
the Connection Machine seems to be promising are test generation, logic 
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optimization, etc. 


Simulation 


The Connection Machine can be used for all types of simulation, like func- 
tional level simulation, gate level simulation, switch level simulation, and 
circuit simulation. 

For gate level simulation, for instance, each gate is represented by 
a record which corresponds to one processing element. Connections be- 
tween gates are represented by pointers. All gates whose input values have 
changed can compute their new output values in parallel. Changes of out- 
put signals are propagated to the inputs of connected gates by moving data 
along the pointers. The propagation of this data utilizes the routing net- 
work of the Connection Machine. While there are special purpose systems 
for logic simulation which are probably faster during the actual simulation 
loop, the Connection Machine offers two distinct advantages. There is no 
complicated and time consuming compilation of netlists necessary which are 
downloaded into a special purpose accelerator®:*!. The Connection Machine 
operates on the same data objects which are used in the memory resident 
design data base. In fact, since the memory of the Connection Machine 
looks to the host like its own memory, it can share the data objects with 
the design data base. This makes any netlist compilation unnecessary and 
it provides all the powerful data access capabilities which can be used by 
the designer to interact with the simulator for debugging purposes. 

Circuit simulation on the Connection Machine is done by using iterative 
techniques called relaration techniques*97*). Relaxation techniques handle 
most MOS circuits quite well. However, convergence tends to be slow if cir- 
cuits contain feedback loops or if tight coupling exists between nodes. For 
these cases, alternate algorithms are under investigation which use direct 
methods for solving the sparse matrices in the innerloop of the algorithm. 
In this context new techniques for solving sparse matrices on a massively 
parallel computer’! are being studied. The implementation of the relax- 
ation algorithm on the Connection Machine is similar to the basic algorithm 
of the program RELAX!!®!], However, it uses Gauss-Jacobi relaxation at 
the level of nonlinear equations!!”) instead of waveform relaxation. The im- 
plementation uses essentially one processor per net and one processor per 
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part. For each time point the following steps are iterated. The parameters 
for the current operating points of the devices are computed. Devices are 
linearized at their current operating point. This computation is done for 
all devices in parallel. This information is then sent to the nodes. Message 
passing is done in parallel. This step corresponds to setting up the matrix 
for solving the system of equations. Then the new voltages for the nodes 
are computed. Again this is done for all nodes in parallel. Initial results 
show that circuits with up to approximately 10,000 nodes can be simulated 
in nearly constant time. 


Circuit Placement and Routing 


A placement technique which seems to work with consistently reasonable 
results on a wide variety of placement problems is the s:mulated annealing 
technique!!?!4:15]_ However, simulated annealing is very computation inten- 
sive on a sequential machine. We are experimenting with several variations 
of the annealing algorithm on the Connection Machine. Preliminary results 
show substantial speedups for a number of placement problems. 

Recall that the innerloop of the annealing algorithm consists of gener- 
ating a new state and of calculating the change of the cost function for this 
new state and of deciding whether to accept or reject the new state. Ina 
very simple case of circuit placement optimization a new state is generated 
by swapping two parts. If the cost function measures the total wire length, 
then the corresponding change in wire length needs to be computed for 
the swap. The intuitive idea of speeding this process up with a parallel 
computer is to do many of these innerloops in parallel. If more then one 
pair of parts is chosen randomly, then there is a chance that different pairs 
have a common wire or a common net. Thus the change in wire length 
can no longer be computed independently for each swap. However, in any 
given circuit there are generally many pairs of parts which do not share a 
common net and thus could be swapped in parallel. Let’s call these pairs 
to be independent. Notice that in general there is for a given circuit no 
unique set of pairs which are pairwise independent. One way to do many 
innerloops in parallel is as follows: 


1. Randomly select a large number of pairs which are called candidates 
for swaps. 
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2. Eliminate pairs until all remaining pairs are pairwise independent. 


3. Calculate the change in cost for all swaps and decide whether to 
accept or reject the swap. 


We now outline an implementation of these steps on the Connection 
Machine. In order to show the essence of the algorithm, we will describe 
a largely simplified version. Assume each part is stored in one processing 
element and has pointers to all parts to which it is connected. 

In step 1, for each part a random number is generated which decides 
whether the part will search for a partner or not. Each part now generates 
a random number which identifies the candidate for forming a pair and 
sends a message to this part. If a part receives a message from more then 
one part then it responds to the part with the highest priority. We assume 
that each part has a unique number which is its priority. The priority of 
a part changes during the annealing process. We assume for simplicity 
reasons that the parts which were originally selected do not respond to 
messages. Now a set of pairs is formed. We assume that the message 
exchange included an exchange of coordinate positions. Each pair now 
assumes as its priority the highest priority of both partners. 

In step 2, each of the parts of the selected pairs sends a message to all 
parts it is connected to except for its partner. If a part receives a message 
with a priority which is higher than its own, then it sends a message to its 
partner to become inactive and it sets itself inactive. If there is a chain 
of pairs, then with this algorithm all but one pair of the chain becomes 
inactive. We will not discuss the solution to this. At the end of step 2, we 
are left with a set of independent pairs and we can do step 3 completely in 
parallel. 

Notice that step 1 is done in parallel. However, some overhead is in- 
curred over the sequential case since conflicts during the pairing process 
must be resolved. Step 2 is overhead which does not occur in the sequen- 
tial form of the algorithm. 

A number of different routing algorithms are under investigation for 
the Connection Machine. A nice example of a routing algorithm which 
can take advantage of the parallel processing capabilities of the Connection 
Machine is the Lee Algorithm!®!. We outline a very simplified version of 
this algorithm on the Connection Machine. Assume the surface of a chip is 
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decomposed into square areas. Each area is represented by one processing 
element. For each square it is known which nets terminate in it and which 
wiring tracks are available. A path for a wire is then determined as follows. 
The processing element at one end of the wire starts the search process 
by sending a message to all four of its neighbors. The receiving processing 
elements marks a wiring track, if one is available, with the number 1 (Lee 
calls this number the mass). Those processing elements which have a wiring 
track then store a pointer to the processor which sends the message. Then 
they increase the mass and send a message to all of their neighbors except 
to the neighbor who send a message to them. Notice that, after the search 
process started, a processing element may receive messages from more then 
one processing element. In that case, if it has an available track, it stores 
pointers to both sending processors, and sends messages to all neighbors 
except those from which it received messages. If there is a path, then 
the search process terminates when the processing element at the second 
terminal of the wire receives a message. With the help of the backpointers, 
starting from the destination processor, an optimal path is selected. Notice 
that if a complete layout is stored in the Connection Machine, a number 
of nets can be wired in parallel using the Lee Algorithm if the search area 
for a given net is constrained and if the search areas for nets to be wired 
in parallel do not overlap. Since the parallel computation capabilities can 
also be utilized to determine a set of nets which can be routed in parallel, 
the overhead for this computation can be kept small. 


Circuit Mask Layout Processing 


As an example of this kind of processing, we select the problem of reporting 
intersections between shapes. This problem is a common subproblem which 
needs to be solved in a number of layout processing algorithms like design 
rule checking, circuit extraction, processing of E-beam data, etc. In order 
to give a flavor of how the parallel processing capabilities of the Connection 
Machine apply to this problem, we restrict our discussion to rectangular 
shapes. One way for solving this problem is to use a raster approach. It 
should be noted that approaches are being investigated as well, particularly 
for the case of arbitrary shapes. We assume that the coordinate space has 
been compacted, as described for instance in!!7), in case the dimensions of 
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